Synchronizing secondary audiovisual content based on frame transitions in streaming content

ABSTRACT

According some aspects, a secondary device may display secondary audiovisual content along with playback of audiovisual content on a primary device. For example, the secondary device may display an augmented reality application synchronized with the video. Aspects may predetermine a set of frame transition ranges for the video, where each respective frame transition is determined based on frames of the video that are determined to be substantially identical by a frame reference function and frames that are determined to be different. Two frames may be substantially identical even if they are different in the source video. This may be due to shortcomings in the frame reference function, or encoding/compression losses in transmission and playback of the video. Playback may be synchronized based on a first detected frame, but synchronization may be refined upon detecting a frame transition to a second frame that is no longer substantially identical to prior frames.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Application Serial No.17/388,890 filed on Jul. 29, 2021, which is a continuation of prior U.S.Application Serial No. 17/147,178 filed on Jan. 12, 2021, the entiretyof which is incorporated herein by reference.

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

FIELD OF USE

Aspects of the disclosure relate generally to synchronizing secondarycontent playback with primary content playback. More specifically,aspects of the disclosure provide methods and techniques forsynchronizing playback of an augmented reality application, on asecondary device, corresponding to video playback on another devicebased on detecting frame transitions in the video playback.

BACKGROUND

Augmented reality (AR) technologies allow computing devices to displayaudiovisual experiences, sometimes interactive, overlaid on capturedvideo of the real world around a user. AR applications have been used inmany industries to present users with additional information andenhanced experiences, promoting products or digitizing experiences. ARapplications can serve as “second screen” experiences for televisioncontent, for example where additional information about a televisionshow is displayed on a second device in synchronization with thetelevision show.

AR applications that synchronize with other displayed content mustdetermine a playback location of the displayed content so thatcorresponding AR content can be displayed. Several techniques exist forsynchronizations, such as embedded codes added to a video for the ARapplication to detect and determine a playback location. However, theseembedded codes often change the nature of the content and can beintrusive to the user experience. Software toolkits, such as Apple’sARKit for iOS are available to provide functions supporting ARexperience on common user devices.

Music videos are a popular form of entertainment, allowing users toexperience a combined audiovisual work tying a song to often interestingvideo. Originally consumed by users on television channels such as MTV,today music videos are streamed online at popular websites such asYouTube and Vevo. And streaming radio sites such as Spotify have largelyreplaced personal MP3 collections, CD collections, or FM radio as users’preferred way to listen to music.

The nature of the content of the video may lead to some consecutiveframes of the video being identical as to content. For example, a titlescreen may have static content and appear for several seconds. Beyondactually identical frames, minor differences frame-to-frame, even ifpart of a bigger movement, might be imperceptible to users on aframe-by-frame basis. These same imperceptible differences could beprogrammatically identified by a pixel-by-pixel comparison. Butanalyzing every pixel of a frame is computationally taxing and ofteninfeasible in application. Thus applications may perform imagerecognition on part or less than all of the image, such as through asampling technique or aggregation. These frame-to-frame issues may befurther exacerbated by streaming video/radio platforms. Streamingvideo/radio services must balance audiovisual quality/fidelity withInternet bandwidth/speed limitations and goals. As a result, a musicvideo recorded at a very high resolution with nuanced details might bestreamed at a lower resolution or have other details removed by videocompression techniques. Similarly, high quality audio may be streamed ata lower bitrate to balance bandwidth considerations. Many of theresulting changes in the content are unnoticeable by most users, or mayotherwise not disrupt the viewing/listening experience. But theselimitations may further complicate the ability of image recognitiontechniques to discern distinctions from frame-to-frame.

Aspects herein may provide an AR application displayed insynchronization with a music video or radio stream. One problempresented by the AR platforms described above is that they must embedtags or other codes into content to allow the AR application tosynchronize. This may require additional processing of the content by acontent producer, and may negatively impact the user experience. Even ifencoded within the content in a manner inobtrusive to the user, thesesystems still require modification of the source content and cannot beflexibly applied to existing content already on a streaming platformwithout modifying the content. Aspects described herein may leverage theshortcomings of Internet streaming platforms to address these and otherproblems, and generally improve the quality, efficiency, andadaptability of AR applications presenting secondary content insynchronization with audiovisual content from a streaming service.

SUMMARY

The following presents a simplified summary of various aspects describedherein. This summary is not an extensive overview, and is not intendedto identify key or critical elements or to delineate the scope of theclaims. The following summary merely presents some concepts in asimplified form as an introductory prelude to the more detaileddescription provided below.

Aspects discussed herein may relate to methods and techniques fordisplaying secondary audiovisual content in synchronization with othermedia content. For example, aspects described herein may provide asystem for displaying an augmented reality application, on a secondarydevice, in synchronization with playback of a music video on a primarydevice. In a streaming video, multiple consecutive frames may be sosimilar that a frame reference function (also referred to as a framereference identification function) provided by an AR toolkit determinesthem to be substantially identical, even if the content of the framesdoes differ. The frame reference function may be a behind-the-scenespart of the AR toolkit, but serves to determine the identity of acapture frame. Synchronization cues may be taken from determining frametransitions, accounting for the imperfect nature of streaming contentand its impact on content matching functions, according to some aspects.Frame transitions that are recognizable by the AR toolkit may bepredetermined for a video, generating a set of frame transition ranges.Then, the secondary device may synchronize playback based on a firstdetected frame, but synchronization may be refined upon detecting aframe transition to a second frame that is no longer substantiallyidentical to prior frames.

Thus, some aspects may provide a computer-implemented method tosynchronize playback between secondary audiovisual content and a video.The method may comprise determining a set of frame transition ranges fora video. Each respective frame transition range may comprise a startingframe identifier and an ending frame identifier associated with a seriesof frames of the video that are each determined to be substantiallyidentical by a predefined frame reference function. The frame referencefunction may be an image matching function provided as part of an ARtoolkit, such as functionality using the ARReferenceImage element in theARKit for iOS provided by Apple. For example, a frame reference functionmay be configured to retrieve a current frame and compare it to anARReferenceImage to determine if the frame matches a reference image.The frame reference function may, in some implementations, be part ofthe AR toolkit’s environment and might not be expressly called by theapplication developer. Instead, the frame reference function may befunctionality built into other components of the AR toolkit, such ascomponent that recognized when a known reference frame is present incaptured video content. The video played back on the primary device maybe a compressed video file streamed over the Internet via a networkconnection. The video file may be in a lower bitrate encoding, orinclude compression artifacts causing nuanced differences in contiguousframes of a source video to be lost in the streaming video file, forexample. Similar-but-not-identical frames may nonetheless be deemedsubstantially identical by the predefined frame reference function dueto how the reference function is configured. The predefined framereference function may determine the contiguous series of frames to besubstantially identical based on the streaming file’s omission of thesenuanced differences due to, e.g., compression artifacts or quality ofthe streaming file. For example, the reference function may determinetwo frames to be substantially identical if they match within a certainthreshold percentage.

According to some aspects, each frame in the frame transition range maybe determined to be substantially identical to each other frame in theframe transition range based on the predefined frame reference functiondetermining that the frames are identical. The set of frame transitionranges may be predetermined based on processing frames of the video todetermine series of contiguous frames that are deemed substantiallyidentical by the predefined frame reference function. In a given frametransition range, the starting frame identifier may correspond to thefirst frame of the series of contiguous frames that differs from theframes of a prior transition range based on the predetermined framereference function, and the ending frame identifier may correspond tothe last frame of the series of contiguous frames prior to a differentframe of a next transition range based on the predetermined framereference function.

The method may comprise capturing, by a secondary device and at a firsttime, a currently displayed first frame of the video during playback ofthe video by a primary device. The secondary device may determine, basedon the predefined frame reference function, a first frame transitionrange corresponding to the captured first frame of the video. Thesecondary device may synchronize secondary audiovisual content (such asan augmented reality application) with the playback of the video basedon the first frame transition range corresponding to the captured firstframe of the video. Synchronizing the secondary audiovisual content may,for example, comprise causing events in the AR application to bedisplayed on the secondary device when a corresponding event isdisplayed on the primary device. For example, the secondary device maycause events in an AR application to be displayed in coordination withthe current playback position of the video.

The method may further comprise capturing, by the secondary device andat a second time after the first time, a currently displayed secondframe of the video during the playback of the video by the primarydevice. The secondary device may determine whether the captured secondframe corresponds to a current frame transition range corresponding to acurrent playback position of the secondary audiovisual content based onthe predefined frame reference function. And the secondary device maysynchronize, based on determining that the captured second frame doesnot correspond to the current frame transition range, the secondaryaudiovisual content with the playback of the video based on a secondframe transition range identified as corresponding to the capturedsecond frame based on the predefined frame reference function. Forexample, the AR application may be synchronized to a starting frame ofthe second frame transition range upon recognizing the captured secondframe and corresponding frame transition.

According to some aspects, the method may further comprise capturing, bythe secondary device and at a third time between the first time and thesecond time, a currently displayed third frame of the video during theplayback of the video by the primary device. The secondary device maydetermine whether the captured third frame corresponds to the currentframe transition range based on the predefined frame reference function.And, based on determining that the captured third frame does correspondto the current frame transition range, the secondary device may continueplayback of the secondary audiovisual content synchronized based on thefirst frame transition range corresponding to the captured first frameof the video.

According to some aspects, the method may further comprise capturing, bythe secondary device and at a third time, a currently displayed thirdframe of the video during the playback of the video by the primarydevice. The secondary device may determine that the captured third framecorresponds to the first frame transition range based on the predefinedframe reference function. The secondary device may determine that a timeperiod between the first time and the third time exceeds a duration ofthe first frame transition range. Based on determining that the timeperiod exceeds the duration of the first frame transition range, thesecondary device may pause playback of the secondary audiovisualcontent. While playback of the secondary audiovisual content is paused,the secondary device may capture a currently displayed fourth frame ofthe video during the playback of the video by the primary device. Thesecondary device may determine that the captured fourth framecorresponds to a different frame transition range, other than the firstframe transition range, based on the predefined frame referencefunction. And based on determining that the captured fourth framecorresponds to the different frame transition range, the secondarydevice may resume playback of the secondary audiovisual content based onthe different frame transition range.

In some implementations, synchronizing the secondary audiovisual contentwith the playback of the video based on the first transition range maycomprise synchronizing the secondary audiovisual content with theplayback of the video based on the starting frame identifier of thefirst frame transition range. In some implementations, synchronizing thesecondary audiovisual content with the playback of the video based onthe first transition range may comprise selecting a frame identifierbetween the starting frame identifier of the first frame transitionrange and the ending frame identifier of the first frame transitionrange based on selection criteria and synchronizing the secondaryaudiovisual content with the playback of the video based on the selectedframe identifier.

In some implementations, synchronizing the secondary audiovisual contentmay be based on audio output by the primary device and associated withthe playback of the video. For example, due to similar limitations inInternet streaming of audio content, the video frame transitionsynchronization techniques described above may be adapted to matchreference audio portions captured by the secondary device. This audiosynchronization method may be utilized to, e.g., display an ARapplication in synchrony with streamed music. This audio synchronizationmay also be used in conjunction with the video synchronization todetermine a more accurate result.

Aspects may provide a computer-implemented method to synchronize displayof secondary audiovisual content with playback of a video based ondetecting frame transitions. The method may comprise determining a setof frame transitions for a video. Each frame transition may correspondto a respective starting frame that is determined to be different from aprior frame based on the predefined frame reference function. The methodmay further comprise capturing, by a secondary device and at a firsttime, a currently displayed first frame of the video during playback ofthe video by a primary device. The secondary device may determine afirst playback position of the video based on a first frame transition,of the set of frame transitions, corresponding to the captured firstframe of the video. The secondary device may synchronize secondaryaudiovisual content with the playback of the video based on the firstframe transition corresponding to the captured first frame of the video.The method may further comprise capturing, by the secondary device andat a second time, a currently displayed second frame of the video duringthe playback of the video by the primary device. The secondary devicemay determine a second frame transition, of the set of frametransitions, corresponding to the captured second frame when thepredefined frame reference function indicates that the captured secondframe is different from the captured first frame. The secondary devicemay synchronize, based on determining the second frame transition, thesecondary audiovisual content with the playback of the video based onthe starting frame of the second frame transition identified ascorresponding to the captured second frame.

Corresponding apparatus, systems, and computer-readable media are alsowithin the scope of the disclosure.

These features, along with many others, are discussed in greater detailbelow.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of example and not limitedin the accompanying figures in which like reference numerals indicatesimilar elements and in which:

FIG. 1 depicts an example of a computing device that may be used inimplementing one or more aspects of the disclosure in accordance withone or more illustrative aspects discussed herein;

FIG. 2 depicts an example computing environment, including a primarydevice displaying video content and a secondary device displayingsecondary audiovisual content, in accordance with one or more aspects ofthe disclosure;

FIGS. 3A-3B depict a synchronization between the secondary audiovisualcontent and the playback position of the video, in accordance with oneor more illustrative aspects discussed herein;

FIG. 4 depicts an illustrative sequence of frames that make up anexemplary video, and frame transition ranges associated therewith, inaccordance with one or more illustrative aspects discussed herein;

FIG. 5 depicts an example of two frames that a frame reference functionmay determine to be substantially identical even if not identical; and

FIG. 6 depicts a flowchart illustrating a method of synchronizingsecondary audiovisual content and a playback position of a video, inaccordance with one or more illustrative aspects discussed herein.

DETAILED DESCRIPTION

In the following description of the various embodiments, reference ismade to the accompanying drawings, which form a part hereof, and inwhich is shown by way of illustration various embodiments in whichaspects of the disclosure may be practiced. It is to be understood thatother embodiments may be utilized and structural and functionalmodifications may be made without departing from the scope of thepresent disclosure. Aspects of the disclosure are capable of otherembodiments and of being practiced or being carried out in various ways.Also, it is to be understood that the phraseology and terminology usedherein are for the purpose of description and should not be regarded aslimiting. Rather, the phrases and terms used herein are to be giventheir broadest interpretation and meaning. The use of “including” and“comprising” and variations thereof is meant to encompass the itemslisted thereafter and equivalents thereof as well as additional itemsand equivalents thereof.

By way of introduction, aspects discussed herein may relate to methodsand techniques for displaying secondary audiovisual content insynchronization with other media content. For example, aspects describedherein may provide a system for displaying an augmented realityapplication, on a secondary device, in synchronization with playback ofa music video on a primary device. In a streaming video, multipleconsecutive frames may be so similar that a frame reference functionprovided by an AR toolkit (such as functionality using theARReferenceImage element from the iOS ARKit) determines them to besubstantially identical, even if the content of the frames does differ.Synchronization cues may be taken from determining frame transitions,accounting for the imperfect nature of streaming content and its impacton content matching functions, according to some aspects. Frametransitions that are recognizable by the frame reference function may bepredetermined for a video, generating a set of frame transition ranges.Then, the secondary device may synchronize display of secondary contentwith the video based on detecting a frame transition or based ondetecting that a currently captured frame of the video does not alignwith an expected frame transition range for the current playbackposition of the secondary audiovisual content. Playback may besynchronized based on a first detected frame, but synchronization may berefined upon detecting a frame transition to a second frame that is nolonger substantially identical to prior frames.

Before discussing these concepts in greater detail, however, severalexamples of a computing device that may be used in implementing and/orotherwise providing various aspects of the disclosure will first bediscussed with respect to FIG. 1 .

FIG. 1 illustrates one example of a computing device 101 that may beused to implement one or more illustrative aspects discussed herein. Forexample, computing device 101 may, in some embodiments, implement one ormore aspects of the disclosure by reading and/or executing instructionsand performing one or more actions based on the instructions. In someembodiments, computing device 101 may represent, be incorporated in,and/or include various devices such as a desktop computer, a computerserver, a mobile device (e.g., a laptop computer, a tablet computer, asmart phone, any other types of mobile computing devices, and the like),and/or any other type of data processing device.

Computing device 101 may, in some embodiments, operate in a standaloneenvironment. In others, computing device 101 may operate in a networkedenvironment. As shown in FIG. 1 , various network nodes 101, 105, 107,and 109 may be interconnected via a network 103, such as the Internet.Other networks may also or alternatively be used, including privateintranets, corporate networks, LANs, wireless networks, personalnetworks (PAN), and the like. Network 103 is for illustration purposesand may be replaced with fewer or additional computer networks. A localarea network (LAN) may have one or more of any known LAN topology andmay use one or more of a variety of different protocols, such asEthernet. Devices 101, 105, 107, 109 and other devices (not shown) maybe connected to one or more of the networks via twisted pair wires,coaxial cable, fiber optics, radio waves or other communication media.

As seen in FIG. 1 , computing device 101 may include a processor 111,RAM 113, ROM 115, network interface 117, input/output interfaces 119(e.g., keyboard, mouse, display, printer, etc.), and memory 121. I/O 119may include a variety of interface units and drives for reading,writing, displaying, and/or printing data or files. I/O 119 may becoupled with a display such as display 120. Memory 121 may storesoftware for configuring computing device 101 into a special purposecomputing device in order to perform one or more of the variousfunctions discussed herein. Memory 121 may store operating systemsoftware 123 for controlling overall operation of computing device 101,control logic 125 for instructing computing device 101 to performaspects discussed herein, augmented reality (AR) application 127, andother applications 131. Control logic 125 may be incorporated in and maybe a part of AR application 127. In other embodiments, computing device101 may include two or more of any and/or all of these components (e.g.,two or more processors, two or more memories, etc.) and/or othercomponents and/or subsystems not illustrated here.

Devices 105, 107, 109 may have similar or different architecture asdescribed with respect to computing device 101. Those of skill in theart will appreciate that the functionality of computing device 101 (ordevice 105, 107, 109) as described herein may be spread across multipledata processing devices, for example, to distribute processing loadacross multiple computers, to segregate transactions based on geographiclocation, user access level, quality of service (QoS), etc.

One or more aspects discussed herein may be embodied in computer-usableor readable data and/or computer-executable instructions, such as in oneor more program modules, executed by one or more computers or otherdevices as described herein. Generally, program modules includeroutines, programs, objects, components, data structures, etc. thatperform particular tasks or implement particular abstract data typeswhen executed by a processor in a computer or other device. The modulesmay be written in a source code programming language that issubsequently compiled for execution, or may be written in a scriptinglanguage such as (but not limited to) HTML or XML. The computerexecutable instructions may be stored on a computer readable medium suchas a hard disk, optical disk, removable storage media, solid statememory, RAM, etc. As will be appreciated by one of skill in the art, thefunctionality of the program modules may be combined or distributed asdesired in various embodiments. In addition, the functionality may beembodied in whole or in part in firmware or hardware equivalents such asintegrated circuits, field programmable gate arrays (FPGA), and thelike. Particular data structures may be used to more effectivelyimplement one or more aspects discussed herein, and such data structuresare contemplated within the scope of computer executable instructionsand computer-usable data described herein. Various aspects discussedherein may be embodied as a method, a computing device, a dataprocessing system, or a computer program product.

Having discussed several examples of computing devices which may be usedto implement some aspects as discussed further below, discussion willnow turn to a method for synchronizing secondary audiovisual content(such as an AR application), by a secondary device, with playback of avideo (such as a music video) on a primary device.

Aspects may be implemented, for example, in an application on asecondary device which uses the visual recognition of frames of a musicvideo played by a primary device to collect data on the start of a musicvideo and present a visual, augmented reality and/or digital experience(e.g., secondary audiovisual content corresponding to the music video)through the application. The augmented reality may create an interactiveexperience to augment the music video, which may be heightened throughcompetition and even may allow users to win various prizes (such ascash, merchandise) based on their ranking within the application. In anexample implementation, competitions may begin at a certain time tocreate a sense of excitement around the launch of a new music video.Users may thus anticipate each upcoming music video due to a sense ofurgency or fear of missing out, and this may create new opportunitieswithin the augmented reality world. Augmented reality experiences canextend the music video content by introducing new visual, interactiveexperiences beyond the music video content. They may introduce differentcharacters, or provide an interactive game on top of the music video.Different games can be provided with different scoring or interactions,and can create excitement and better engage users. These aspects maythus provide a platform to express pop culture in a strong digital form,by integrating exclusive events, news and an environment whereindividuals can compete and experience new music and cultural moments.Using augmented reality, aspects described herein may open up a newopportunity for digital concerts and digital experiences which are notalways possible due to the physical nature of artists. And with the riseof all digital artists, where an avatar or other online presence servesas the artist’s persona, an augmented reality platform according to someaspects may improve content producers’ ability to promote, develop, andexpand their audience and engagement.

In an example, consider an artist that releases a song on Spotify. Thehook of the song could have an audio trigger that opens an additionalcontent platform on the user’s mobile phone. The additional contentcould, for example, be a game or an alternate version of the song, orexclusive merchandise. Similarly, the artist releases a music video onYouTube. Frames of the video could trigger the additional contentplatform to open additional content – like a video game – whichsynchronizes throughout the music video and triggers different aspectsof the game. For example, a use case could be a user has a music videoplaying on a laptop or desktop computer. The user could scan a QR codein the music video to access an AR application on the user’s mobilephone. In these ways and others, the AR application is able to identifywhich song/video the user is viewing, and can select the appropriate ARexperience.

As discussed above, however, existing second screen experiences fortelevision require obtrusive tags and/or codes to be embedded in thevideo in advance to enable the AR application and substantiallysynchronize it with the video. This can be troublesome for contentproducers, as it requires them to modify existing video content andpotentially reupload to streaming platforms, losing views and reputation(such as “likes” and shares). Aspects described herein thus mayrecognize reference frames in the video without requiring the embeddingof coded tags in the video. This may allow for creation of ARexperiences for video content after the video content has been released,and could allow for creation by third parties. But, as also discussedabove, recognizing reference frames can be imperfect due to theconfiguration of the frame reference identification function chosen, ordue to imperfections introduced by the realities of a streamingplatform. For example, a streaming platform may downsample a bitrate orresolution of audiovisual content, which could result in nuanceddifferences among frames being removed. Or, the contrary could happenwhere the act of downsampling introduces differences that are not in thesource material.

Frame reference functions, such as functionality utilizing Apple’sARReferenceImage element from the ARKit for iOS, are configured toaccount for some variation among images that it otherwise determines areidentical. The frame reference function may, in some implementations, bepart of the AR toolkit’s environment and might not be expressly calledby the application developer. Instead, the frame reference function maybe functionality built into other components of the AR toolkit, such ascomponent that recognized when a known reference frame is present incaptured video content. For example, in some implementations, the framereference function may refer to the functionality at work behind thescene in Apple’s ARKit when an AR session flags a frame identifiedevent. A set of ARReferenceImages may be used to configure the ARsession, and the session may raise an event each time a reference imageis identified in the data captured by the camera. As used herein, theframe reference function in this example is the functionality thatidentifies whether a portion of the capture image corresponds to areference image. A frame reference function may be configured toretrieve a current frame and compare it to a set of ARReferenceImages todetermine if the frame matches a reference image.

An illustrative implementation of some aspects is provided below, inwhich the frame reference function is provided by background framematching functionality in Apple’s ARKit. The below code is provided inApple’s Swift language. The function “renderer” is called as a delegatewhen the AR session recognizes a frame within the image captured by thecamera, based on a set of ARReferenceImages used to configured the ARsession. So once the frame reference functionality in ARKit’s AR sessionrecognizes a known reference frame, the delegate function “renderer” iscalled and acts on the recognized frame. The detected frame may,according to some aspects, be associated with a known time range withinthe video.

       ```swift       func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) ->       SCNNode?        {         // This delegate method can be used for other purposes, so we silently fail if the         // anchor object being added to the scene cannot be appropriately casted to an         // ARImageAnchor object, i.e. an image detected by the AR framework         if let imageAnchor = anchor as? ARImageAnchor {            // If code is executing inside of this block then an image has been detected.            if let name = imageAnchor.referenceImage.name {              // Let’s look at the name of the detected image (e.g. if the image detected was              // “my_image.png” the variable`name` would be initialized to “my_image”)              // here. This additional conditional block is really just a sanity check - but              // could also be useful if you want to detect images (and do stuff with them)              // that aren’t frames.              // At this point we know not only if a frame has been detected, and also have a              // reference to that specific frame. So we could look up some additional data              // associated to that frame (such as a range of timestamps that it might be              // detected between) and proceed from there.              let timestamp_range = getFrameMetadata(name: name)              // ... and so on            }          }        }

As will be discussed further herein, some aspects may provide for noveltechniques of refining the identified time stamp within the video basedon detecting a frame transition. For example, an application can checkthe detected “name” of the recognized frame to determine if a frametransition has occurred.

Embedded codes can be effective means of synchronization because theirposition is deliberate and known when added. But when a system relies onreference frame matching instead, a sequence of frames that the framereference function (e.g., frame reference functionality in ARKit basedon ARReferenceImages) recognizes as substantially identical can lead toimprecise synchronization. Thus, some aspects described herein maypredetermine frame transition ranges of substantially identical framesof a video, and dynamically synchronize secondary audiovisual contentwith playback of a video based on the frame transition ranges. Asecondary device may determine which video is being played, and retrievethe predetermined frame transition ranges. Though the secondary devicemay be unable to determine a precise playback location when detecting afirst frame that is part of a series of substantially identical frames,the secondary device may refine the synchronization of the secondaryaudiovisual content once a frame transition is detected to a frame thatis not substantially identical to a current frame transition range,knowing that the video playback is on a different frame than originallydetermined by the secondary device.

FIG. 2 illustrates an example system 200 where a user is viewing astreaming music video 213 via a web browser 211 on a primary device 210(such as a laptop or desktop computer). The user’s mobile device 220 isdisplaying an AR application view 221 overlaid on video playback 223captured by a camera of the mobile device 220. The AR applicationillustrated has the player controlling a vehicle 225 and presentsobstacles 227.

FIGS. 3A and 3B further illustrate the AR application view 221 and howit may be synchronized with playback of the video. As the video playbackposition advances (and the content of the video changes), the ARapplication can be synchronized such that events in the AR applicationoccur in sync with corresponding events in the video. For example, FIG.3A illustrates the initial game state 221 from FIG. 2 corresponding tothe video at frame 223. But in FIG. 3B, the video has advanced to alater frame 323 where the content has changed from frame 223. Properlysynchronized, the AR application 321 may be intended to display a newobstacle 327 corresponding to the content of frame 323. But if the gameis out of synchronization with the video playback position, the properobstacles may not be displayed corresponding to the current state of thevideo which may be a negative experience for the user.

Synchronization can also be important to provide users with the abilityto pause, restart, rewind, fast forward, or otherwise control videoplayback. Such controls can cause the secondary audiovisual content toget out of sync with the video playback, so effective means ofre-synchronizing and dynamically improving the synchronization cansupport these features. For example, the AR application could pause whenthe secondary device detects that the video has paused, such as when asubstantially identical frame has been detected for longer than theexpected duration of a frame transition range associated with thatframe.

Time synchronization is a significant technical problem in ARapplications, and aspects described herein may contribute to improvedtime synchronization and address shortcomings of the streaming platformsand AR toolkits. Aspects described herein may model the time elapsedwithin the video and fire events based off of specific timestamps in animproved fashion. While in practice it may not be possible to perfectlysynchronize the application with media being played on any device (forexample, without using some software bridge between the two devices tokeep them in sync), aspects described herein may minimize the potentialthreshold wherein any desynchronization between the two mediums canoccur. Implementations may reduce the distance between the ARapplication’s internal “event time” and the actual media’s playback timeto less than 250 ms (e.g. within +/- 6 frames of a 24 fps video).Because the average human reaction time falls within this range, thismay provide a seamless experience for the user. Aspects describedfurther herein may improve the AR application on the secondary device tohold an internal representation of the current elapsed time of the mediabeing played back. Then by detecting key “reference points” – frametransitions – of the media, the secondary device may compute anapproximation of the elapsed time throughout the media’s duration andotherwise synchronize in-app events with specific timestamps.

FIG. 4 illustrates content of a set of frames 400 and frame transitionranges 410 associated with an exemplary piece of media content. Asillustrated, the media content (of the example) may comprise 20 frames.Frames 1-8 (e.g., with numbers referring to a frame ID within thelogical context of the video) may be, for example, a relatively statictitle screen for the media content as displayed in frame content 421.Frames 9-10 may be an initial scene 422, whereas frames 11-15 comprisean action scene 423 with frame-to-frame movements shown in frames 423 aand 423 b. Frame 16 may be a closeup frame having content 424, andframes 17-20 may continue the media content.

Frame transition ranges for a video may be used to better synchronizesecondary audiovisual content, such as an AR application. The frametransition ranges for the video may be predetermined using a framereference function, such as an image matching function provided as partof an AR toolkit (e.g., functionality utilizing the ARReferenceImageelement in the ARKit for iOS provided by Apple). Each respective frametransition range may comprise a starting frame identifier and an endingframe identifier associated with a series of frames of the video thatare each determined to be substantially identical by a predefined framereference function. Similar-but-not-identical frames may nonetheless bedeemed substantially identical by the predefined frame referencefunction due to how the reference function is configured and due to thenature of streaming video. For example, the predefined frame referencefunction may determine the contiguous series of frames to besubstantially identical based on the streaming file’s omission of thesenuanced differences due to, e.g., compression artifacts or quality ofthe streaming file.

According to some aspects, each frame in the frame transition range maybe determined to be substantially identical to each other frame in theframe transition range based on the predefined frame reference functiondetermining that the frames are identical. The set of frame transitionranges may be predetermined based on processing frames of the video todetermine series of contiguous frames that are deemed substantiallyidentical by the predefined frame reference function. As used herein,two frames are deemed “substantially identical” when the predefinedframe reference function does not discern a substantial differencebetween the two frames. Aspects herein may utilize a predefined, thirdparty frame reference identification function. Aspects herein may beused with any suitable frame reference identification function, and theparticulars of how the predefined function determines if two frames aresubstantially identical are a matter of implementation. As discussedfurther herein, the AR application may rely on a result returned by theframe reference function as a determination of whether two frames aresubstantially identical or not.

Similar-but-not-identical frames may nonetheless be deemed substantiallyidentical by the predefined frame reference function due to how thereference function is configured. For example, the frame referencefunction may be configured to consider a portion of the frames but lessthan all pixels of the frames. Or the frame reference function maysample regions within the frames, perhaps aggregating various pixelblocks to efficiently compute whether the two frames are the same.Similarly, to be robust to various conditions in the image capture(e.g., skew, lighting, obstructions), the frame reference function maybe configured to adapt to these conditions such that it may properlydetermine that two frames that are identical even if lighting or otherconditions introduce external changes. As a simple example, the framereference function may be determined to treat two frames as identical ifa similarity between the frames is more than a particular threshold, forexample 95% the same. In practice, image recognition functions such as aframe reference identification function are much more complex in howthey discern whether two frames are substantially identical or not. Andin the example of Apple’s ARKit, the frame reference function may beimplemented in the background functionality underlying an AR session,utilizing ARReferenceImages to determine when a reference frame isidentified for processing by the application delegate functions.

In a given frame transition range, the starting frame identifier maycorrespond to the first frame of the series of contiguous frames thatdiffers from the frames of a prior transition range based on thepredetermined frame reference function, and the ending frame identifiermay correspond to the last frame of the series of contiguous framesprior to a different frame of a next transition range based on thepredetermined frame reference function.

In the illustrated example, frames 1-8 may be deemed identical and/orsubstantially identical (e.g., matching within a certain thresholdpercentage, such as 99% the same) because each has same content 421.Thus, frames 1-8 may be determined to belong to a same frame transitionrange 411. Similarly, Frames 9 and 10 may have the same content 422, andmay be determined to belong to frame transition range 412.

Frames 11-15, in the example, correspond to an action scene withmovements. Generally frames 11-15 have substantially identical content,but subtle variations associated with the movement may exist in content423 a and 423 b. Nonetheless, the frame reference function may determineframes 11-15 to be substantially identical. This may be because theframe reference function is configured to consider only portions of theframes, or if the frame comparison techniques overlook certain nuanced,bit by bit differences. As mentioned previously, the frame referencefunction may deem frames to be substantially identical if they matchwithin certain threshold values. And it may be because downsampling orother techniques used to manage streaming video have impacteddiscernible differences among frames. Because frames 11-15 aredetermined to be substantially identical, frames 11-15 are determined tobelong to frame transition range 413.

Completing the example, frame 16 may be part of frame transition range414 and have distinct content 424 of a closeup on a character. Andframes 17-20 may be part of frame transition range 415 due tosimilarity/identical content.

FIG. 5 depicts another example of two frames 551 and 552 in videocontent that may be determined to be substantially identical, despitehaving actually different content. As discussed above, a key difficultywhich arises naturally when utilizing a detected frame for determiningthe elapsed time of the video is the fact that similar frames cannot bereasonably distinguished from one another due to limitations of theframe reference function and/or streaming video applications. Morespecifically, consecutive frames are typically highly similar. Frame 551is effectively identical to frame 552 despite being part of an activevideo with motion.

Frames at the beginning of the video may receive additional processingas recognizing these frames during video playback may be needed toidentify the video and start the corresponding augmented realityexperience. In determining the frame transition ranges, the system maylabel distinct frames with their respective timestamps. The distinctframes may be those that provide a discernable distinction recognizableby the frame reference function. The level of “distinctness” for a givenimplementation may be tuned to the needs of the application. Forexample, the frame reference function may be configured with a thresholdlevel of distinctiveness for use in determining whether two consecutiveframes are substantially identical or not. Where fine-grainedsynchronization is needed, and processor power is not a limitation, alow threshold of distinctiveness (such as 99% the same) may be used. Inprocessing-limited scenarios, as another example, a higher threshold ofdistinctiveness may be used (such as 90% the same) to determine“distinct” frames, frame transitions. Frame transition ranges may bedetermined based on additional factors, such as frame sampling time,recency of changes, amplitude and volatility in changes over a range oftime, and the like.

The predetermined frame transitions and frame transition ranges may beused to synchronize display of secondary audiovisual content, by asecondary device, with playback of a video by primary device. A cameraof the secondary device may capture the displayed video output from theprimary device, and may continually process each individual frame in thevideo buffer to determine whether or not any of the labelled frametransitions or otherwise distinct frames have been detected. If aparticular labelled frame / known frame transition is detected, then theAR application on the secondary device may approximate the elapsed timeof the video and synchronize events in the AR application (or othersecondary audiovisual content) based on the playback time within thevideo. The playback time estimation may be based on the known frametransition range. For example, on a detected frame transition from onerange to another, the AR application may use a starting frame ID of thecurrent frame transition range to determine a playback position in thevideo. But where a detected frame is not associated with an immediateframe transition, the AR application may not be able to determine whereplayback is within the frame transition range. Aspects may provideadditional criteria for determining an estimated playback time when thecurrent frame is at an indeterminate position within a current frametransition range. For example, the AR application may default to thestart of the frame transition range. Or, it may use the midpoint of therange, for example.

FIG. 6 illustrates a method 600 for synchronizing secondary content, bya secondary device, with playback of streaming video content on aprimary device. For example, method 600 may be used by secondary device220 of FIG. 2 to synchronize playback of AR application 221 withplayback of video 213 by primary device 210.

At step 605, frame transitions and/or frame transition ranges within thevideo may be predetermined. Each respective frame transition range maycomprise a starting frame identifier and an ending frame identifierassociated with a series of frames of the video that are each determinedto be substantially identical by a predefined frame reference function.The frame reference function may be an image matching functionalityprovided as part of an AR toolkit, such as functionality utilizing theARReferenceImage function in the ARKit for iOS provided by Apple. Asdiscussed throughout, similar-but-not-identical frames may nonethelessbe deemed substantially identical by the predefined frame referencefunction due to how the reference function is configured. The predefinedframe reference function may determine the contiguous series of framesto be substantially identical based on the streaming file’s omission ofthese nuanced differences due to, e.g., compression artifacts or qualityof the streaming file. For example, the reference function may determinetwo frames to be substantially identical if they match within a certainthreshold percentage.

According to some aspects, each frame in the frame transition range maybe determined to be substantially identical to each other frame in theframe transition range based on the predefined frame reference functiondetermining that the frames are identical. The set of frame transitionranges may be predetermined based on processing frames of the video todetermine series of contiguous frames that are deemed substantiallyidentical by the predefined frame reference function. In a given frametransition range, the starting frame identifier may correspond to thefirst frame of the series of contiguous frames that differs from theframes of a prior transition range based on the predetermined framereference function, and the ending frame identifier may correspond tothe last frame of the series of contiguous frames prior to a differentframe of a next transition range based on the predetermined framereference function.

Additionally and/or alternatively, the system may predetermine frametransitions or other distinct frames within the video. These frametransitions may be labelled and time stamped relative to the video, andsimilarly used to synchronize playback of the AR application (or othersecondary audiovisual content) with the playback of the video.

At step 610, the secondary device may begin the secondary contentapplication. For example, the user may launch the AR application ontheir mobile device. As another example, the secondary device may detecta triggering event (such as an embedded cue in the music video) thatcauses the secondary device to launch the AR application.

At step 615, the secondary device may detect corresponding videoplayback on the primary device. For example, a camera of the secondarydevice may capture a field of view, and determine whether the capturedfield of view includes a frame of the video. The secondary device mayprocess the captured frame to determine an identity of the video beingwatched, so that a suitable AR experience can be launched. Additionallyand/or alternatively, the user may select a desired AR experience to bedisplayed along with the video playback.

At step 620, the secondary device may capture a currently displayedframe of the video playback as displayed on the primary device.

At step 625, the secondary device may determine an initialsynchronization between the AR application (secondary audiovisualcontent) and the video being played back on the primary device. Theinitial synchronization may be determined based on the predeterminedtime stamp labels determined to correspond to the captured first frame.Based on the frame reference function, the secondary device maydetermine that the captured first frame corresponds to a first frametransition range of the set of frame transition range that werepredetermined for the video. The fist frame transition range may beassociated with a video timestamp, and this time stamp may be used tosynchronize events of the AR application with the video. At an initialpoint of the experience, the AR application may be launched from thebeginning of the experience. But a first detected frame may be used bythe AR application to determine an initial synchronization between theapp. The secondary device may synchronize secondary audiovisual content(such as an augmented reality application) with the playback of thevideo based on the first frame transition range corresponding to thecaptured first frame of the video. Synchronizing the secondaryaudiovisual content may, for example, comprise causing events in the ARapplication to be displayed on the secondary device when a correspondingevent is displayed on the primary device. For example, the secondarydevice may cause events in an AR application to be displayed incoordination with the current playback position of the video. The ARapplication may not be able to determine where playback is within thefirst frame transition range if there have been no other frametransition ranges observed. Aspects may provide additional criteria fordetermining an estimated playback time when the current frame is at anindeterminate position within a current frame transition range. Forexample, the AR application may default to the start of the frametransition range. Or, it may use the midpoint of the range, for example.

At step 630 the secondary device may playback the AR application inassociation with the video playback on the primary device. If playbackhas not ended at step 635 (no), then the secondary device may advance tostep 640 to continually and/or periodically capture an updated currentlydisplayed frame that the primary device is currently displaying tofurther track and/or revise the synchronization between the ARapplication and the video playback.

At step 645, the secondary device may determine whether the capturedsecond frame corresponds to a current (expected) frame transition rangecorresponding to a current playback position of the secondaryaudiovisual content based on the predefined frame reference function.This may comprise using results of the frame reference function todetermine if the current frame is substantially identical to an expectedfor the current (expected) frame transition range associated with thecurrent playback position of the AR application. If the current frame isidentical to the frame expected based on the expected position withinthe video, playback may continue and the method returns to step 630. Ifthe current frame is not substantially identical to the expected framefor the current playback position, the method proceeds to step 650.

At step 650, the secondary device may determine a frame transitioncorresponding to the captured current frame. This may be based on theframe reference function determining that the captured frame is notsubstantially identical to a prior frame transition range of the video.The AR application may, in some implementations, assume that thecaptured frame belongs to a next frame transition range. In otherimplementations the AR application may determine which frame transitionrange the captured frame belongs to based on the frame referencefunction.

At step 655, the secondary device may update synchronization between theAR application and the video playback based on the determined frametransition range corresponding to the captured current frame. Forexample, the secondary device may determine that the captured frameindicates that video playback is at a playback position that is known tocorrespond to the determine frame transition range from thepredetermining in step 605. This may also include a reasonable buffertuned to implementation needs, comparing the AR application’s internalestimate of the video playback position (internal event clock) to theknown timestamp associated with the captured second frame to determineif the AR application and video playback are out of sync. If thedifference between the AR playback position and the video playbackposition are less than a threshold margin, playback may continue.Otherwise, according to some aspects, the secondary device may adjustthe timing of the AR application to more closely synchronize with thevideo playback.

In some implementations, synchronizing the secondary audiovisual contentmay be based on audio output by the primary device and associated withthe playback of the video. For example, due to similar limitations inInternet streaming of audio content, the video frame transitionsynchronization techniques described above may be adapted to matchreference audio portions captured by the secondary device. This audiosynchronization method may be utilized to, e.g., display an ARapplication in synchrony with streamed music. This audio synchronizationmay also be used in conjunction with the video synchronization todetermine a more accurate result.

Several methods are available to include audio synchronization tofurther refine the synchronization of the AR application and videoplayback. Some are “invasive” in the sense that they may requiremanipulation of the audio file which the AR application is intending tosynchronize with. In particular, example the “reference points” thatcould be used are high frequency tones (>20 kHz) and/orsteganographically embedded data, each of which would encode the exacttimestamps of where they occur in the audio track. But each of theseapproaches may encounter technical challenges due to common practice ofhosting platforms (e.g. YouTube) to re-encode uploaded media. Anothermethod is to to generate a spectrogram of the audio in realtime whilesimultaneously splitting this image into contiguous “frames”. Thistechnique would map directly onto the video synchronization techniqueoutlined above, as the audio frames could be analyzed for distinctnessin the same manner and monitored to determine a synchrony between the ARapplication the video playback.

According to some aspects, the captured current frame may correspond toa frame transition period prior to where the AR application expectsplayback to be. This may indicate that the AR application is ahead intime and needs to be delayed to allow the video to catch up. But it alsomay indicate that the video is paused, and the AR application shouldpause as well. Thus, aspects may further comprise the secondary devicedetermining that a time period between the first time a frame of thisframe transition range was captured and the current time that the latestframe of the frame transition range was captured exceeds the totalduration of the frame transition range. This may indicate that the videohas been paused, as the video should not remain on the same frame forlonger than the predetermined frame transition ranges. Based ondetermining that the time period exceeds the duration of the frametransition range, the secondary device may pause playback of thesecondary audiovisual content. While playback of the secondaryaudiovisual content is paused, the secondary device may continue tocapture currently displayed frames of the video during the playback ofthe video by the primary device (which may be paused). The secondarydevice may determine that a captured frame corresponds to a differentframe transition range, other than the frame transition range thatindicated the video was paused, based on the predefined frame referencefunction. For example, the secondary device may determine that a new,distinct frame appears after the frame did not change for a while. Basedon determining that the captured frame corresponds to a different frametransition range, the secondary device may resume playback of thesecondary audiovisual content based on the different frame transitionrange.

Once synchronization is updated, if necessary, processing returns tostep 630 and playback of the secondary content continues withcontinuous/periodic returns to step 640 to check a currently displayedvideo frame for updates. Once playback ends, method 600 is complete.

The frame transition detection and synchronization may be thought of(and implemented using) a state machine, according to some aspects. Asmentioned above, the system may predetermine the earliest and latestframes, within an interval, for which the image recognition software(frame reference function, e.g. ARKit’s frame identificationfunctionality that identifies a reference frame in captured image data)will recognize them as identical. Then within the AR application a statemachine may be employed which contains the following states andtransitions (according to an implementation): no image being detected,entering image detection, image continually being detected, and leavingimage detection. The states are simplest to explain in that they aresimply used to maintain themselves and check whether or not a transitionhas occurred. Transitions are used roughly the same among differentstates: they are used to collect a distribution of real-time data whichencapsulates how far away the AR application’s internal “event time” isfrom the actual elapsed time in the video.

Upon a state transition, if the event time falls within the lower orupper bound associated with the most recent detected frame, then the ARapplication may be assumed to be relatively in sync and the ARapplication continues to run as normal. However, if the event time isdetected to fall outside of this interval then the system may calculatethe least squares line using the collected distribution of errors (aswell as appending the next idealized timestamp to this distribution).The coefficients of this equation are then used to recalibrate the tickrate of the AR application’s timer, and to offset the current eventtime. This technique has the potential added benefit of being able todynamically align itself with the frames per second of the mediadepending on the platform it is being hosted on. For example, YouTubeoften re-encodes videos to be played back at 30fps. But if the framesper second of the streaming video is known in advance, this may befurther utilized in initializing the algorithm and may help minimize anyinitial desynchronization which can occur.

Having described aspects of the invention in detail, below is exemplarypseudo code with explanatory comments detailing an algorithm for animplementation of some aspects described herein.

// Check if a pixel buffer “looks enough like” a known frame,// returning the frame’s id if a match is found, else NULLfunction RecognizeFrame(frame) -> String?// Maps each frame id to the smallest and largest frame number// which it has been observed to occur at -- fixed ahead of timeframe_bounds : Map<String, Array<Int>> // Global variablesinternal_time : Float = 0 event_tickrate : Float =1/60event_time : Int = 0 wf_count : Int = 0witnessed_frames : Array<String> = [] witness_index : Int = 0witnessed_timestamps : Array<Int> = [] is_witnessing : Boolean = falsecurrent_id : String dist_index : Int = 0error_distribution : Array<Int> = []// Called by our timer 60 times per second function TimerCallback() {       internal_time += 1 / 60        if internal_time >= event_tickrate       {               event_time += 1               internal_time = 0              // Executes event with id ‘i’ iff event_time == i              ExecuteEvent(event_time)        } }// Called once for each frame received by the camera. Collect an array of all// timestamps wherein a particular frame has been observed -- consuming them// once a state transition has occurred, i.e.// (Frame A witnessed -> Frame B witnessed)// (Frame A witnessed -> No frame witnessed)function WitnessedFrameTimestamp(frame) {       recognized_id = RecognizeFrame(frame)       if recognized_id != NULL        {              // If at least two sequences of frames have been observed              // assume that media playback has occurred and start the              // timer used for triggering events              if wf_count < 1 and recognized_id not in witnessed_frames              {                      if wf_count == 1                     {                            // Create a timer which calls the function                            // ‘TimerCallback’ 60 times per second                            CreateTimer(TimerCallback, 60)                            // Forcefully set the initial event time to the lower bound                            // of the current frame being witnessed                            event_time = frame_bounds[recognized_id][0]                     }                     witnessed_frames[wf_count] = recognized_id                     wf_count += 1               }              if witness_index > 0               {                     ConsumeWitnessedTimestamps(current _id, witness_index,              witnessed_timestamps)                     witness_index = 0                     witnessed_timestamps = []               }              is_witnessing = true              current_id = recognized_id              witnessed_timestamps[witness_index] = event_time              witness_index += 1        }        else        {              is_witnessing = false               if witness_index > 0              {                     ConsumeWitnessedTimestamps(current _id, num_witnessed,              witnessed_timestamps)                     witness_index = 0                     witnessed_timestamps = []               }        }} // Update error distribution and update tick rate if applicablefunction ConsumeWitnessedTimestamps(id, num_witnessed, timestamps) {       lower_bound = frame_bounds[id][0]       upper_bound = frame_bounds[id][1]        first = timestamps[0]       last = timestamps[num_witnessed - 1]       lower_difference = lower_bound - first       upper_difference = last - upper_bound       if upper_difference > 0        {              event_time -= upper_difference              error_distribution[dist_index] = [ last, upper_bound ]              dist_index += 1              UpdateTickrate(error_distribution, id)        }       else if lower_difference > 0        {              event_time += lower_difference              error_distribution[dist_index] = [ lower_bound, first ]              dist_index += 1              UpdateTickrate(error_distribution, id)        } }function UpdateTickrate(distribution) {        X, Y, Xs, XY = 0       N= 1        for [ x, y ] in distribution        {              X += x               Y += y               Xs += x * x              XY+=x*y               N += 1        }       // Get the lower bound for the frame id proceeding ‘id’       next_lower = NextFrameBounds(id)        X += next_lower       Y += next_lower        Xs += next_lower* next_lower       XY += next_lower* next_lower       denominator = (N * Xs - X * X)        if denominator != 0       {               a = (N * XY - X * Y) / denominator              b=(Y-a*X)/N               event_tickrate = a / 60              event_time -= Floor(b)        } }

Aspects described herein are not limited only to video content. Many ofthe same issues with video content and downsampling/compression arepresent in audio content as well. Some implementations may synchronizesecondary audiovisual content with captured audio based on predeterminedtransition portions of audio. And audio feedback can be used inconjunction with the video capture to provide a further enhancedsynchronized experience for the user. And aspects described herein arenot limited to music videos, any suitable video, audiovisual content, orother suitable content that can be divided into logical frames may beprocessed and analyzed according to the techniques described above todetermine synchronization timings for an AR application or othersecondary audiovisual content. Further, the secondary audiovisualcontent need not include both audio and visual elements. As used herein,the secondary audiovisual content may include audio content, videocontent, and/or combinations thereof.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

1. (canceled)
 2. (canceled)
 3. (canceled)
 4. A method of determining aset of frame transition ranges for a video, wherein the set of frametransition ranges are used by a user device to control synchronizeddisplay of secondary audiovisual content associated with the video, themethod comprising: processing frames of the video to determine aplurality of series of contiguous frames that are deemed substantiallyidentical by a predefined frame reference function; determining the setof frame transition ranges for the video based on the plurality ofseries of contiguous frames, wherein a given frame transition rangecorresponds to a respective series of contiguous frames that are deemedsubstantially identical by the predefined frame reference function andcomprises: a respective starting frame identifier of the frametransition range, corresponding to the first frame of the series ofcontiguous frames that differs from the frames of a prior transitionrange based on the predetermined frame reference function, and arespective ending frame identifier of the frame transition range,corresponding to the last frame of the series of contiguous frames priorto a different frame of a next transition range based on thepredetermined frame reference function; and transmitting the set offrame transition ranges in association with the video.
 5. The method ofclaim 4, wherein the video is a streaming video.
 6. The method of claim5, further comprising: generating the streaming video, wherein thestreaming video comprises a compressed version of an original video. 7.The method of claim 5, further comprising: generating the streamingvideo, wherein the streaming video comprises a lower resolution versionof an original video.
 8. The method of claim 5, wherein the predefinedframe reference function determines the contiguous series of frames tobe substantially identical based on at least one difference, between thestreaming video and a corresponding original video, that was caused bygenerating the streaming video.
 9. The method of claim 5, wherein thepredefined frame reference function determines the contiguous series offrames to be substantially identical based on compression artifactsassociated with generating the streaming video.
 10. The method of claim5, wherein the predefined frame reference function determines thecontiguous series of frames to be substantially identical based oncompression artifacts associated with transmitting the streaming video.11. The method of claim 5, wherein: a first frame and a second frame ofthe streaming video are deemed substantially identical by the predefinedframe reference function, and a corresponding first frame and acorresponding second frame of a corresponding original video are notsubstantially identical.
 12. The method of claim 4, wherein processingthe frames of the video to determine a given of series of contiguousframes that are deemed substantially identical by a predefined framereference function comprises: processing a first frame of the video anda subsequent second frame of the video using a predefined framereference function to determine whether the first frame and the secondframe are deemed substantially identical by the predefined framereference function; based on determining that the first frame and thesecond frame are deemed substantially identical by the predefined framereference function, continuing to process one or more next framessubsequent to the second frame to determine whether each next frame isdeemed substantially identical to a prior frame by the predefined framereference function; and determining a last frame of the series ofcontiguous frames based on determining that a next frame is not deemedsubstantially identical to a prior frame by the predefined framereference function, wherein the series of contiguous frames includeseach frame deemed substantially identical by the predefined framereference function.
 13. The method of claim 4, wherein the predefinedframe reference function is provided by an augmented reality toolkit andis configured to compare a portion of a captured image to a referenceimage.
 14. The method of claim 4, wherein each frame in a respectiveframe transition range is determined to be substantially identical toeach other frame in the frame transition range based on the predefinedframe reference function determining that the frames are identical. 15.The method of claim 4, wherein the secondary audiovisual contentcomprises an augmented reality application corresponding to the video.16. A method of determining a set of frame transitions for a video,wherein the set of frame transitions are used by a user device tocontrol synchronized display of secondary audiovisual content associatedwith the video, the method comprising: processing frames of the video todetermine a plurality transition frames that are deemed to be differentfrom a prior frame based on the predefined frame reference function;determining the set of frame transitions for the video based on theplurality of transition frames, wherein a given frame transitioncorresponds to a respective series of contiguous frames that are deemedsubstantially identical by the predefined frame reference function andcomprises: a respective starting frame identifier of the frametransition, corresponding to the first frame of the series of contiguousframes that differs from the frames of a prior series of contiguousframes based on the predetermined frame reference function; andtransmitting the set of frame transition ranges in association with thevideo.
 17. The method of claim 16, wherein the video is a streamingvideo, wherein the streaming video comprises a compressed version or alower resolution version of an original video.
 18. The method of claim16, wherein the predefined frame reference function determines thecontiguous series of frames to be substantially identical based on atleast one difference, between the streaming video and a correspondingoriginal video, that was caused by generating the streaming video. 19.The method of claim 16, wherein the predefined frame reference functiondetermines the contiguous series of frames to be substantially identicalbased on compression artifacts associated with generating the streamingvideo or associated with transmitting the streaming video.
 20. Themethod of claim 16, wherein: the secondary audiovisual content comprisesan augmented reality application corresponding to the video, and thepredefined frame reference function is provided by an augmented realitytoolkit, associated with the augmented reality application, and isconfigured to compare a portion of a captured image to a referenceimage.
 21. A method of determining a set of frame transition ranges foran audio track, wherein the set of frame transition ranges are used by auser device to control synchronized display of secondary audiovisualcontent associated with the audio track, the method comprising:processing audio frames of the audio track to determine a plurality ofseries of contiguous frames that are deemed substantially identical by apredefined frame reference function, wherein processing a given audioframe of the audio track comprises generating, as the given audio frame,a spectrogram based on a first portion of the audio track; determiningthe set of frame transition ranges for the audio track based on theplurality of series of contiguous frames, wherein a given frametransition range corresponds to a respective series of contiguous framesthat are deemed substantially identical by the predefined framereference function and comprises: a respective starting frame identifierof the frame transition range, corresponding to the first frame of theseries of contiguous frames that differs from the frames of a priortransition range based on the predetermined frame reference function,and a respective ending frame identifier of the frame transition range,corresponding to the last frame of the series of contiguous frames priorto a different frame of a next transition range based on thepredetermined frame reference function; and transmitting the set offrame transition ranges in association with the audio track.
 22. Themethod of claim 21, wherein the audio track is a streaming audio track,wherein the streaming audio track comprises a compressed version or alower resolution version of an original audio track.
 23. The method ofclaim 21, wherein the predefined frame reference function determines thecontiguous series of frames to be substantially identical based on atleast one difference, between the streaming audio track and acorresponding original audio track, that was caused by generating thestreaming audio track.